-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Fix get_value_estimate and buffer append #2276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| value_next = self.policy.get_value_estimates( | ||
| bootstrapping_info, | ||
| idx, | ||
| info.local_done[l] and not info.max_reached[l], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to determine this just from bootstrapping_info and idx within get_value_estimates()? I couldn't quite convince myself when I was looking at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure - it seems like bootstrapping_info becomes the previous info if the not condition is met. Seems like we'll run into an issue if both of those conditions are met, since bootstrapping_info will be sth different and we can't figure out if info.local_done is True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's more or less what I thought. Sounds good as it is!
|
Note that this should fix #1798 |
| return run_out | ||
|
|
||
| def get_value_estimates(self, brain_info, idx): | ||
| def get_value_estimates(self, brain_info, idx, done): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type annotations if it's not too hard to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added type annotation and test for this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM, left some optional feedback.
* develop: (69 commits) Add different types of visual encoder (nature cnn/resnet) Make SubprocessEnvManager take asynchronous steps (#2265) update mypy version one more unused remove unused variables Fix respawn part of BananaLogic (#2277) fix whitespace and line breaks remove codacy (#2287) Ported documentation from other branch tennis reset parameter implementation ported over Fixed the default value to match the value in the docs two soccer reset parameter implementation ported over 3D ball reset parameter implementation ported over 3D ball reset parameter implementation ported over Relax the cloudpickle version restriction (#2279) Fix get_value_estimate and buffer append (#2276) fix lint checks Add Unity command line arguments Swap 0 set and reward buffer append (#2273) GAIL and Pretraining (#2118) ...
In newer versions of numpy (>1.14), the fact that we convert a np array into a list, append an np array, and convert it back to a list in
get_gaecauses problems.This commit changes
get_value_estimatesto output adictof floats rather than np arrays, and uses np.append rather than converting to a list and back to resolve this issue.